Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

نویسندگان

  • Yen Hock Tan
  • He Huang
  • Daisuke Kihara
چکیده

Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A 3 D - 1 D Substitution Matrix for Protein

In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...

متن کامل

Estimation of Evolutionary Distance between Distantly Related Sequences of Amino Acids, Taking Account of Patterns of Amino Acid Replacement

When amino acid sequences are distantly related-for instance, when their identity is <0.30-it is difficult to estimate their evolutionary distance. A method called the “similarity distance method” (SD method) was developed to obtain maximum-likelihood estimates of evolutionary distance between amino acid sequences, on the basis of a given pattern of amino acid replacement. Computer simulation r...

متن کامل

A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.

In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...

متن کامل

Development of a new glycan score matrix

Glycans are chains of monosaccharides also known as oligosaccharides. Since glycans consist of monosaccharides having multiple hydroxyl groups which bind with potentially multiple other monosaccharides, glycans have very complicated structures compared to nucleic acid or protein sequences. The complexity is complicated further by various glycosidic linkage patterns which vary according to anome...

متن کامل

The random character of protein evolution and its effects on the reliability of phylogenetic information deduced from amino acid sequences and compositions.

Because evolution occurs by random events, the actual number of substitutions that occur in any period is not exactly equal to the number expected from the mean rate of substitution, but is statistically distributed about it. In consequence, even if rates of evolution are constant in different lineages, 'trees' deduced from descendant protein sequences contain random errors. When there are fewe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proteins

دوره 64 3  شماره 

صفحات  -

تاریخ انتشار 2006